# Gawk script collection for Twapperkeeper data processing - v1.0 (22 June 2011)
#
# released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au

This collection contains Gawk scripts intended for processing Twapperkeeper / yourTwapperkeeper
archives of tweets, for data extraction, manipulation, and processing, and preparation for network
visualisation. For more information on setting up yourTwapperkeeper to provide data in
the required format, see:
http://www.mappingonlinepublics.net/2011/06/21/switching-from-twapperkeeper-to-yourtwapperkeeper/ 

The latest collection of these scripts will always be available from: 
http://www.mappingonlinepublics.net/resources/

All scripts are provided as is, with no guarantee of accuracy or reliability. While all efforts have
been made to ensure the reliability of these scripts, no warranty is given or implied.

If you publish research which was conducted using these scripts, please acknowledge this. The
script package can be cited as follows (modify for other bibliographic referencing schemes as 
required):

Axel Bruns and Jean Burgess. "Gawk Scripts for Twitter Processing." v1.0. _Mapping Online Publics_, 
22 June 2011. <http://mappingonlinepublics.net/resources/>.


INSTALLATION

All scripts should be placed in a central directory which is easily accessible from the command line
interface. The urlresolve.awk script requires the open source tool wget to be installed and in the
command path; it also needs to be modified so that the 'path' variable points to the directory 
containing the scripts (relative or absolute paths are acceptable; relative paths must be relative 
to the location that scripts are intended to be excecuted from). Paths must conform to standard PC,
Mac, or Linux notation as appropriate; special characters (e.g. backslash) need to be escaped.

Additionally, of course, it is assumed that Gawk is installed and in the command path.


KNOWN ISSUES

The Mac version of Gawk has not implemented the 'switch' statement; atreplycount.awk and 
multicount.awk will not work, therefore. Mac Gawk can be recompiled to include 'switch'; search the
Web for instructions on how to do so. There will be a workaround in a future revision of these
scripts, replacing 'switch' with 'if/then' constructions.

If used with the 'stats' command line argument, atreplycount.awk will produce a division by zero 
error if any of the usernames specified with the 'search' command line argument did not tweet 
and/or receive @replies/RTs in the dataset being processed. Remove these usernames from the command
line argument if the problem occurs. A fix will be made available in a future revision of these 
scripts.


USAGE

A brief overview of the scripts and their respective functions is provided in the Quick Guide file
included in this archive.

Generally, scripts should be called as follows:

gawk -F , -f [script].awk [argument]="[parameters]" input.csv >output.csv
(for Twapperkeeper files in comma-separated value format)

or 

gawk -F \t -f [script].awk [argument]="[parameters]" input.tsv >output.tsv
(for Twapperkeeper files in tab-separated value format)

Some scripts do not take any command line arguments; some may take multiple. All scripts are able
to process both comma- and tab-separated value formats (CSV/TSV), and will usually return their 
results in the same format. 

The exceptions from this rule are atextractfromtoonly.awk, preparegexfattimeintervals.awk, and 
gexfattimeintervals.awk: the first two output CSV only, while the latter generates a GEXF file.
This is necessary since the network visualisation tool Gephi only processes CSV or GEXF formats.

Specific usage instructions for each script are included in the header information of the script.
To view these instructions, scripts can be opened with any text editor, such as NoteTabLight or 
Notepad.

Sharing and modification of these scripts is expressly permitted. Note that a Creative Commons
licence (BY, NC, SA) applies in each case: you must acknowledge the original authors, must not
share or sell these scripts for commercial gain, and must share your modifications under the
same licence conditions.


For more information on these scripts and our research, see http://mappingonlinepublics.net/.